library("astsa")

March 28, 2016

1.1: Introduction

The analysis of experimental data that have been observed at different points in “time” (usually equally spaced).

The obvious correlation introduced by sampling the adjacent points in time can severly restrict the applicability of conventional statistical methods that assume independent and identically distributed observations.

The first step to analysis is to plot the data as a function of time.

The two statistical methods used to analyze time series data are the

  1. time domain approach
  2. frequency domain approach


Methodology

Assumption? This pattern continues into the future.


Components

  1. Trend
    • The long-run upward or downward movement of the series.
  2. Cycle
    • The upward and downward movement of the series around the trend (think of a wave).
  3. Seasonal variations
    • The yearly (weekly, quarterly, …) patterns in the data (think of temperature).
  4. Irregular fluctuations
    • The erratic movements in the series that cannot be accounted for by a pattern.

The goal is to only have these irregular fluctuations left after the modleling is done (white noise).

1.2: The Nature of Time Series Data

Example Johnson & Johnson Quarterly Earnings
These are quarterly earnings over 84 quarters (21 years) from 1960 to 1980.

library("astsa")
#Example 1.1: JandJ Quarterly Earnings
plot(jj, type="o", ylab="Quarterly Earnings per Share")

#log transformation:
plot(log(jj), type="o", ylab="Quarterly Earnings per Share")

Example Global Warming
The data are global mean land-ocean temperature index from 1880 to 2009, with base period 1951 - 1980. These are deviations, measured in degrees centigrade, from the 1951 - 1980 average.

#Example 1.2: Global Warming
plot(gtemp, type="o", ylab="Global Temperature Deviations")

Example Speech Data
This is an example to investigate computer recognition of speech. The data shows a 0.1 second sample of recorded speech for the phrase “aaa … hhh,” and note the repetitive nature of the signal and the regular periodicities.

#Example 1.3: Speech Data
plot(speech)

Example NY Stock Exchange
Some financial time series the daily returns of the NYSE from February 2, 1984 to December 31, 1991.

#Example 1.4: NYSE
plot(nyse, ylab="NYSE Returns")

Example El Nino and Fish Population
Looking at multiple time series simultaneously. This data represents monthly values of an environmental series called the Southern Oscillation Index (SOI) and associated refruitment (an index of number of new fish). Both series are for a period of 452 months ranging from 1950 to 1987.

SOI measures changes in air pressure related to sea surface temperatures in the Central Pacific Ocean.

#Example 1.5: El Nino
par(mfrow = c(2,1)) # set up the graphics
plot(soi, ylab="", xlab="", main="Southern Oscillation Index")
plot(rec, ylab="", xlab="", main="Recruitment")

Example fMRI Imaging
A collection of independent series or vector of series, generated under varying experimental conditions or treatment configurations we observe data collected from various locations on the brain via functional magnetic resonance imaging (fMRI). Five subjects were given periodic brushes on the hand. The stimulus was applied for 32 seconds and then stopped for 32 seconds. The sample rate was one observation every 2 seconds for 256 seconds (\(n=128\)). The series are consecutive measures of blood oxygenation-level dependent (BOLD) signal intensity, which measures the activity of the brain.

#Example 1.6: fMRI Imaging
par(mfrow=c(2,1), mar=c(3,2,1,0)+.5, mgp=c(1.6,.6,0))
ts.plot(fmri1[,2:5], col=1:4, lty=c(1,2,4,5), ylab="BOLD", 
        xlab="", main="Cortex")
ts.plot(fmri1[,6:9], col=1:4, lty=c(1,2,4,5), ylab="BOLD", 
        xlab="", main="Thalamus & Cerebellum")
mtext("Time (1 pt = 2 sec)", side=1, line=2)

Wednesday, March 30, 2016
Example Earthquakes and Explosions
This represents two phases (or arrivals) along the surface denoted \(p(t=1, \dots, 1024)\) and \(S(t=1024, \dots 2048)\) at a seismic recording station. The recording instruments in Scandanavia are ovserving earthquake and mining explosions. A general problem of interest is to distinguish or discriminate between waveforms generated by earthquakes and those generated by explosions.

library(astsa)
par(mfrow=c(2, 1))
plot(EQ5, main="Earthquake")
plot(EXP6, main="Explosion")

1.3: Time Series Statistical Models

par(mfrow=c(1,1))
w <- rnorm(500)
plot.ts(w, main="White Noise")

v <- filter(x=w, filter=c(1/3, 1/3, 1/3), sides=2)
par(mfrow=c(2,1))
plot.ts(w, main="White Noise")
plot.ts(v, ylim=c(-3, 3), main="Moving Average")

x <- filter(w, filter=c(1, -.9), method='recursive')
plot.ts(x, main='AR(2)')


Note that the beginning of the series looks odd. For this reason, we usually introduce a “burn-in” period by adding extra points at the beginning and removing them after the fit.

AR(2) stands for an autoregressive model of order 2.

w.new <- rnorm(550) # Add 50 to avoid startup problems
x.new <- filter(w.new, filter=c(1, -.9), method='recursive')[-(1:50)]
plot.ts(x.new, main="AR(2)")

set.seed(123)
w <- rnorm(200)
x <- cumsum(w) # random walk w/o drift
wd <- w + .3
xd <- cumsum(wd) # random walk w/ drift
par(mfrow=c(1,1))
plot.ts(x, ylim=c(-10, 70), main="a random walk")
lines(xd, col='red') # w/ drift

The Basics

Let the value of the series at some time point \(t\) be \(x_t\). Then the observed values can be represented as \(x_1\), the initial time point, \(x_2\), the second time point, and so forth, up until \(x_n\), the last ovserved point.

1.4: Measures of Dependence

Definition The mean function is defined as \[ \mu_{xt} = E(x_t) \] provided it exists, where \(E\) denotes the usual expectation operator.

The notation specifies which series and at which time we are finding the mean for.

If we only have one series, we would simplify the notation to \(\mu_t\)

Example (Moving Average) Note, the mean of a white noise series is zero at all \(t\), \[ \mu_{wt}=E(w_t) = 0 \ \ \ \forall t \] If we look at the previous moving average example, the mean function is \[ E(v_t) = \frac{1}{3}[E(w_{t-1}) + E(w_t) + E(w_{t+1})] = 0 \] Note, this happens to not be a function of time.

Problem (Autoregressive) For the previous autoregressive example (\(x_t = x_{t-1} - 0.9x_{t-2} + w_t\)), we are assuming the mean is constant. Find the mean function.
Solution If the mean is constant, then \(x_i \ \forall i\) is constant, say, \(\mu\) \[ \begin{aligned} E(x_t) &= E[x_{t-1} - 0.9x_{t-2} + w_t] \\ \mu &= E[x_{t-1}] - 0.9E[x_{t-2}] + E[w_t] \\ \mu &= \mu - 0.9\mu + 0 \\ \mu &= 0.1 \mu \\ 0.9\mu &= 0 \\ \mu &= 0 \end{aligned} \]

Example In the previous problem, consider adding a constant to the model: \[ x_t = c + x_{t-1} - 0.9 x_{t-2} + w_t \] Then the mean function becomes \[ \begin{aligned} \mu &= E(x_t) \\ &= E(c + x_{t-1} - 0.9 x_{t-2} + w_t) \\ &= E(c) + E(x_{t-1}) - 0.9 E(x_{t-2}) + E(w_t) \\ &= c + \mu - 0.9 \mu + 0 \\ \mu(1 - 1 + 0.9) &= c \\ \mu &= \frac{c}{0.9} \end{aligned} \] The result is counterintuitive. If the mean function is zero, and we add a constant \(c\), then we would expect that the new mean function would be \(c\).

Definition The autocovariance function is defined as the second moment product \[ \begin{aligned} \gamma_x (s, t) &= \text{cov}(x_s, x_t) \\ &= E[(x_s - \mu_s)(x_t - \mu_t)] \end{aligned} \] for all \(s\) and \(t\).
\(\gamma_x(s,t)=\gamma_x(t,s)\).

The autocovariance measures the linear dependence between two points on the same series, possibly observed at different times.

When \(s=t\), we have the variance at time \(t\).

\[ \gamma(t,t) = E[(x_t - \mu_t)^2] = \text{Var}(x_t) \]

As with most variances, this quantity is usually unknown.

We would have a problem with estimability if we let the variance at time \(t\) be different for all \(t\) (can’t estimate variance with one observation).

Monday, April 4, 2016

Example White Noise: For the white noise series \(w_t\), \[ \begin{aligned} \gamma_w(s,t) &= \text{Cov}(w_s, w_t) \\ &= E[(w_s - \mu_{ws})(w_t - \mu_{wt})] \\ &= E[(w_s - 0)(w_t - 0)] \\ &= E(w_s w_t) \\ &= \begin{cases} \begin{array}[lcl] \\ \sigma_w^2, && s=t \\ 0, && s \neq t \end{array} \end{cases} \end{aligned} \]

Example Moving Average: Find the autocovariance of the moving average series from the previous example \[ v_t = \frac{1}{3}(w_{t-1} + w_t + w_{t+1}) \] \[ \begin{aligned} \gamma_v(s,t) &= E[(v_s - 0)(v_t - 0)] \\ &= E\left[\left(\frac{w_{s-1}}{3} + \frac{w_s}{3} + \frac{w_{s+1}}{3} - 0 \right)\left(\frac{w_{t-1}}{3} + \frac{w_t}{3} + \frac{w_{t+1}}{3} - 0 \right)\right] \\ &= E\bigg(\frac{w_{s-1} w_{t-1}}{9} + \frac{w_{s-1} w_t}{9} + \frac{w_{s-1} w_{t+1}}{9} + \\ & \ \ \ \ \ \ \ \ \ \ \ \frac{w_s w_{t-1}}{3} + \frac{w_s w_t}{9} + \frac{w_s w_{t+1}}{9} + \\ & \ \ \ \ \ \ \ \ \ \ \ \frac{w_{s+1} w_{t-1}}{9} + \frac{w_{s+1} w_t}{9} + \frac{w_{s+1} w_{t+1}}{9}\bigg) \\ \end{aligned} \] When the subscripts don’t match, the expected value of their products is zero. As result, this autocovariance function yields: \[ \gamma_v(s,t) = \begin{cases} \begin{array}[lcl] \\ \frac{3}{9} \sigma^2_w, && s=t \\ \frac{2}{9} \sigma^2_w, && |s-t|=1 \ \ \ \ \text{i.e.} \ s=t+1 \ \text{or} \ s= t-1 \\ \frac{1}{9} \sigma^2_w, && |s-t|=2 \\ 0, && |s-t| \geq 3 \end{array} \end{cases} \] This is interesting because it doesn’t matter where in the series you are, only the distance between the points matters.

Problem Random Walk: Find the autocovariance of the random walk with drift. Use \(x_t = \delta_t + \sum_{j=1}^t w_j\).
Solution Recall the mean function: \(\mu_t = E(x_t) = \delta_t\)
\[ \begin{align*} \delta_x(s,t) &= E[(x_s - \delta_s)(x_t - \delta_t)] \\ &= E\Big[\big([\delta_s + \sum_{j=1}^s w_j] - \delta_s \big)\big([\delta_t + \sum_{i=1}^t w_i] - \delta_t \big)\Big] \\ &= E \left[\sum_{j=1}^s w_j \sum_{i=1}^t w_i \right] \\ &= E(w_1 w_1 + w_1w_2 + \cdots + w_sw_t) \\ &= min(s,t)\sigma^2_w \end{align*} \] The last equality comes from the fact that if the subscripts aren’t equal, then the expected value is zero. How many terms with matching subscripts depends on the smaller of the two counts, \(s\) and \(t\).

Note that \(\delta_s\) and \(\delta_t\) cancelled in our derivation, showing that the autocovariance function is the same without the drift.

Definition The autocorrelation function (ACF) is defined as \[ \rho (s,t) = \frac{\gamma(s,t)}{\sqrt{\gamma(s,s) \gamma(t,t)}} \] As before, this is a measure of linear dependence and can be viewed as a measure of linear predictability of the series at time \(t, x_t\), using only \(x_s\).

Example Moving Average: Find the ACF \[ \rho(s,t) = \begin{array}[lcl] \\ \begin{cases} 1, && |s-t| = 0 \\ \frac{2}{3}, && |s-t| = 1 \\ \frac{1}{3}, && |s-t| = 2 \\ 0, && |s-t| \geq 3 \end{cases} \end{array} \] The equation comes from our derived values for the moving averages example for autocovariance. For example, for \(|s-t|=1\), \[ \begin{aligned} \rho (s,t) &= \frac{\gamma(s,t)}{\sqrt{\gamma(s,s) \gamma(t,t)}} \\ &= \frac{\frac{2}{9} \sigma_w^2}{\sqrt{\frac{1}{3} \sigma_w^2 \frac{1}{3} \sigma_w^2}} \\ &= \frac{\frac{2}{9} \sigma_w^2}{\frac{1}{3} \sigma_w^2} \\ &= \frac{2}{3} \end{aligned} \] The ability to make predictions between time points depends on the distance. Further distances away make the predictability more difficult. After 3 or more in distance, we have no information to make a prediction.

Definition The cross-covariance function between two series \(x_t\) and \(y_t\), is \[ \begin{aligned} \gamma_{xy}(s,t) &= \text{Cov}(x_s, y_t) \\ &= E[(x_s - \mu_{xs})(y_t - \mu_{yt})] \\ \end{aligned} \] This is meant to help one to measure the predictability of one series based on another.

We can scale this function so that we have measurement that lies between -1 and 1, which is called the cross-correlation function.

Definition The cross-correlation function (CCF) is defined as \[ \rho_{xy}(s,t) = \frac{\gamma_{xy}(s,t)}{\sqrt{\gamma_x(s,s) \gamma_y(t,t)}} \]

1.5: Stationary Time Series

A Strictly stationarity is one which the statistical behavior of \(x_{t_1}, x_{t_2}, \dots, x_{t_k}\) is identical to that of the shifted set \(x_{t_1 + h}, x_{t_2 + h}, \dots, x_{t_k + h}\) for any collection of time points \(t_1, t_2, \dots, t_k\) and for any shift (lag) \(h\). Unfortunately, this definition is too strong for most applications.

There exists a weaker version that only puts constraints on the first two moments.

Definition A weakly stationary time series is a finite variance process where the following two conditions must be met

  1. the mean value function, \(\mu_t\), is a constant and does not depend on time \(t\).
    • So \(E(x_t) = \mu\) for all \(t\).

  2. the autocovariance function \(\gamma(s,t)\) depends on \(s\) and \(t\) only through their difference \(|s-t|=h\).
    • Also, because the specific values of \(s\) and \(t\) are irrelevant (only the difference matters), then we can simplify the notation. Let \(s=t+h\) where \(h\) is the lag time. Then \[ \begin{align*} \gamma(s,t) &= \gamma(t+h, t) \\ &= \text{Cov}(x_{t+h}, x_t) \\ &= \text{Cov}(x_h, x_0) \\ &= \gamma(h, 0) \end{align*} \] which is true because the covariance of two values at time \(t\) and \(t+h\) is the same as if those values were at \(0\) and \(h\). To simplify further, \(\gamma(h,0) \equiv \gamma(h)\).

An alternative definition of weakly stationary (from PSU):

  1. The mean \(E(x_t)\) is the same for all \(t\).
  2. The variance of \(x_t\) is the same for all \(t\).
  3. The covariance (and correlation) between \(x_t\) and \(x_{t-h}\) is the same for all \(t\).

Any reference to a stationary time series hereafter is referring to a weakly stationary time series.

Definition 1.8 The autocovariance function of a stationary time series is written as
\[ \gamma_x(h) = E[(x_{t+h} - \mu)(x_t - \mu)] \]

Definition 1.9 The autocorrelation function (ACF) of a stationary time series becomes \[ \begin{align*} \rho_x(h) &= \frac{\gamma(t+h, t)}{\sqrt{\gamma(t+h, t+h) \gamma(t, t)}} \\ &= \frac{\gamma_x(h)}{\gamma_x(0)} \end{align*} \] \(\rho_x(h)\) still remains between \(-1\) and \(1\).

Note that \(\gamma(h)=\gamma(-h)\) since \[ \begin{aligned} \gamma(h) &= \\ &= \gamma(t+h-t) \\ &= E[(x_{t+h} - \mu)(x_t - \mu)] \\ &= E[(x_t - \mu)(x_{t+h} - \mu)] \\ &= \gamma(-h) \end{aligned} \] Because of this property, plotting the ACF and ACovF wrt to \(h\) does not require including the negative lag. The function is symmetric about \(h=0\).

Example Let \(w_t\) be white noise, that is \(w_t \stackrel{iid}{\sim} (0, \sigma^2_w = 1)\) and \(x_t = w_t - 0.9w_{t-1}\). This is called a first-order moving average series.
Here, \(E(x_t)=0 - 0.9 \cdot 0 =0\) and \(E(w^2_t)=1\)

(Aside: \(E(w^2_t)=1\) because if, by definition, \(\text{Var}(X)= E(X^2) - \mu^2\), then \(\sigma^2_w = E(w^2) - 0^2=1\) which means that \(E(w^2)=1\)).

Using the definition of a stationary time series ACovF (Def 1.8) \[ \begin{align*} \gamma_x(h) &= E[(x_{t+h} - \mu)(x_t - \mu)] \\ &= E(x_{t+h}x_t - \mu x_{t+h} - \mu x_t + \mu^2) \\ &= E(x_{t+h}x_t) - \mu E(x_{t+h}) - \mu E(x_t) + \mu^2 \\ &= E(x_{t+h}x_t) - 0 \cdot E(x_{t+h}) - 0 \cdot E(x_t) + 0 \\ &= E(x_{t+h}x_t) \\ &= E[(w_{t+h} - 0.9 w_{t+h-1})(w_t - 0.9 w_{t-1})] \\ &= E[w_{t+h}w_t -0.9 w_{t+h} w_{t-1} - 0.9 w_t w_{t+h-1} + 0.9^2 w_{t+h-1} w_{t-1}] \\ &= E(w_{t+h}w_t) -0.9 E(w_{t+h} w_{t-1}) - 0.9 E(w_t w_{t+h-1}) + 0.9^2 E(w_{t+h-1} w_{t-1}) \end{align*} \] When \(h=0\) we have \[ \begin{align*} \gamma_x(0) &= E(w_{t}w_t) -0.9 E(w_{t} w_{t-1}) - 0.9 E(w_t w_{t-1}) + 0.9^2 E(w_{t-1} w_{t-1}) \\ &= 1 - 0.9 \cdot 0 - 0.9 \cdot 0 + 0.9^2 \cdot 1 \\ &= 1 + 0.9^2 \end{align*} \] We can similarly plug in other values which will give us the equation \[ \gamma_x(h) = \begin{array}[lcl] \\ \begin{cases} 1 + 0.9^2, && h=0 \\ -0.9, && h= \pm 1 \\ 0, && |h| \geq 2 \end{cases} \end{array} \] For the ACF of a stationary time series, we have it defined as \(\rho_x(h)= \gamma_x(h) / \gamma_x (0)\).
When \(h=0\), we have \(\rho_x(0)= \gamma_x(0) / \gamma_x (0) = 1\) When \(h=\pm 1\), we have . . . (not sure how we got the answer below) \[ \rho_x(h) = \begin{array}[lcl] \\ \begin{cases} 1, && h=0 \\ - \frac{0.9}{|0.8|}, && h = \pm 1 \\ 0, && |h| \geq 2 \end{cases} \end{array} \]

How to simulate a series

# what this all means will be explained in a future lecture
# first order moving avg
foma <- arima.sim(model=list(order=c(0,0,1), ma=-0.9), n=200) # 200 obs
plot.ts(foma)

Problem Is a random walk weakly stationary?
Solution Recall that a random walk with drift takes the form \(x_t = \delta_t + \sum_{j=1}^t w_j\) where the mean function is \(\mu_t = E(x_t) = \delta_t\). Clearly the mean function depends on \(t\). So it is not stationary.

Without drift, \(\gamma_x(h)=min(t+h,t)\sigma_w^2\), which clearly depends on \(t\). So this is also not stationary.

Example Is \(x_t=\beta_0 + \beta_1 t + w_t\) weakly stationary?
Solution No, since \(E(x_t) = \beta_0 + \beta_1t\) which depends on \(t\).

In any case, ignore the mean function and consider the second criteria for a weakly stationary time series. Is the autocovariance function dependent on \(|s-t|=h\)? It is not a function of \(t\). We may consider the series as having stationary behavior around a linear trend, called trend stationarity.

April 06, 2016
Some simplification also occurs when looking at multiple time series, if we assume they are jointly stationary.

Definition 1.10 Two time series, say \(x_t\) and \(y_t\), are said to be jointly stationary if they are each stationary and the cross-covariance function \[ \begin{align*} \gamma_{xy}(h) &= \text{Cov}(x_{t+h}, y_t) \\ &= E[(x_{t+h} - \mu_x)(y_t - \mu_y)] \end{align*} \] is a function of only the lag, \(h\).

Scaling we get

Definition 1.11 The cross-correlation function (CCF) of jointly stationary time series \(x_t\) and \(y_t\) is defined as \[ \rho_{xy}(h) = \frac{\gamma_{xy}(h)}{\sqrt{\gamma_x(0) \gamma_y(0)}} \] Note that, in general, \(\rho_{xy}(h) \neq \rho_{xy}(-h)\). The order of the series matters.

However, it is the case that \[ \rho_{xy}(h) = \rho_{yx}(-h) \]

Problem Show \(\rho_{xy}(h) = \rho_{yx}(-h)\)

Solution It is sufficient to show that \(\gamma_{xy}(h) = \gamma_{yx}(-h)\) \[ \begin{align*} \gamma_{xy}(h) &= \gamma_{xy}(t+h-t) \\ &= E[(x_{t+h}-\mu_x)(y_t - \mu_y)] \\ &= E[(y_t - \mu_y)(x_{t+h} - \mu_x)] \\ &= \gamma_{yx}(t-(t+h)) \\ &= \gamma_{yx}(-h) \end{align*} \]

Example Suppose that \(y_t = \beta x_{t - \ell} + w_t\) for \(\{x_t\}\) stationary and uncorrelated to \(\{w_t\}\), where \(w_t \sim WN(0, \sigma^2_w)\).
Then \[ \begin{align*} \gamma_{xy}(h) &= \text{Cov}(x_{t+h}, y_t) \\ &= \text{Cov}(x_{t+h}, \beta x_{t - \ell} w_t) \\ &= \text{Cov}(x_{t+h}, \beta x_{t - \ell}) \\ &= \beta \text{Cov}(x_{t+h}, x_{t - \ell}) \\ &= \beta \gamma_x(h + \ell) \end{align*} \] Jointly stationary? Yes. To prove it, you must show \(x_t\) and \(y_t\) are stationary, and the cross-covariance function is not a function of time.

Incidentally, if \(\ell > 0\), we say \(x_t\) leads \(y_t\) and if \(\ell < 0\), we say \(x_t\) lags \(y_t\).

When \(x_t\) leads \(y_t\), the spike in the CCF will be at a negative lag (which is what we want).

set.seed(4866)  
x <- rnorm(100)
y <- lag(x, -5) + rnorm(100) #beta=1, rnorm(100) is wt's
ccf(x, y, ylab="CCovF", type="covariance")

Example In R. The graph shown is actually the estimated cross-covariance function, which we haven’t defined yet.

Time Series Relationships

Definition 1.12 A linear process, \(x_t\), is defined to be a linear combination of white noise variates, \(w_t\), and is given by \[ x_t = \mu + \sum_{j = - \infty}^{\infty} \psi_j w_{t - j} \] where \(\sum_{j = - \infty}^{\infty} |\psi_j| < \infty\).

Here, the mean function is \(E(x_t) = \mu\). The autocovariance function is \[ \gamma_x(h) = \sigma^2_w \sum_{j = - \infty}^{\infty} \psi_{j+h} \psi_j \ \ \text{for} \ h \geq 0 \] (use moving average example from monday to prove this)

\(\{x_t\}\) is stationary.

What if \(j < 0\)? If say, \(j=-2\), then \(w_{t-j}\) of \(x_t = \mu + \sum_{j = - \infty}^{\infty} \psi_j w_{t - j}\), will be \(w_{t+2}\). What that means is that \(x_t\) is modeled after something that’s happening in the future, which doesn’t make much sense. So if \(j <0\), then \(x_t\) depends on future values of \(x_t\).

As such, we will focus on processes that do not depend on the future, called causal, when \(\psi_j=0\) for \(j<0\).

Definition 1.13 A process , \(x_t\), is said to be a Gaussian process if the n-dimensional vectors \(\mathbf{x}=(x_{t1}, x_{t2}, \dots, x_{tn})\) , for every collection of distinct time points \(t_1, \dots, t_n\) have a non-singular multivariate normal distribution.

The non-singular property means that the variance-covariance matrix for \(\mathbf{x}\) is positive definite.

If the Gaussian process is stationary, then \(\mu_t = \mu\) and \(\gamma(t_i, t_j) = \gamma(|t_i - t_j|)\), so the mean vector and covariance matrix are independent of time.

1.6: Estimation of Correlation

Assuming Stationarity
If we do not have stationarity, we could possibly have \(n\) covariance functions to estimate (non-estimable). Therefore, the assumption of stationarity is critical for many problems.

If we have a stationary time series, then the mean function \(\mu_t=\mu\), and so we can estimate this using the sample mean \[ \hat{\mu} = \overline{x} = \frac{1}{n}\sum_{t=1}^n x_t \] The standard error of this estimator is not as trivial as it was before since the observations are correlated.

We have \[ \begin{align*} \text{Var}(\hat{\mu}) &= \text{Var}\left( \frac{1}{n} \sum_{t=1}^n x_t \right) \\ &= \frac{1}{n^2}\text{Cov}\left(\sum_{t=1}^n x_t, \sum_{x=1}^n x_s \right) \\ &= \frac{1}{n^2} [n \gamma_x(0) + (n-1)\gamma_x(1) + (n-2) \gamma_x(2) + \cdots + (n- (n-1))\gamma_x(n-1) + (n-1)\gamma_x(-1) + (n-2)\gamma_x(-2) + \cdots + (n-(n-1)) \gamma_x(1-n)] \\ &= \frac{1}{n} \sum_{h=-n}^n \left(1 - \frac{|h|}{n} \right) \gamma_x(h) \end{align*} \] However, this model isn’t very useful because \(\gamma(h)\) is almost never known. We need an estimate of \(\gamma(h)\).

The Sample Correlation Functions

One technique used for visualizing the relations between a series at different lags is scatterplot matrices. This is too complicated to be do manually.

Recall the Southern Oscillation Index data:

plot.ts(soi)

lag.plot(soi, lags=12, diag=F)

This is a useful preliminary analysis before diving deeply into the data. At the upper right-hand corner, it appears that there is a positive relationship for a lag of 1.

The ACF tells us whether a substantial linear relationship exists between a series and its own lagged values.

Since the true ACF is unknown, we utilize a sample version of the ACF \[ \hat{\rho}_x(h) = \frac{\hat{\gamma}_x(h)}{\hat{\gamma}_x(0)} \] where \[ \hat{\gamma}_x(h) = \frac{1}{n} \sum_{t=1}^{n-h} (x_{t+h}- \overline{X})(x_t - \overline{X}) \] with \(\hat{\gamma}(h) = \hat{\gamma}(-h)\) for \(h=0,1, \dots, n-1\).

acf(soi) # autocorrelation function

At lag 0, the ACF is always 1. It appears cyclical, exhibiting seasonal fluctuations.

Note: this model is bad, as it doesn’t account for many variables. Later we’ll develop a better one.

April 11, 2016
\[ \hat{\gamma}_x(h) = \frac{1}{n}\sum^{n-h}_{t-1}(x_{t+h}-\overline{x})(x_t-\overline{x}) \] The sum runs over a restricted range since \(t+h\) cannot exceed \(n\).

Dividing by \(n\) (or \(n-h\)) will not result in an unbiased estimator.

Under the assumption that the process \(x_t\) is white noise, the approximate standard error of the sample ACF is \(\frac{1}{\sqrt{n}}\). \(\hat{\sigma}_{\hat{\rho}} \approx \frac{1}{\sqrt{n}}\). (Double-check this equation - may be a typo)

If we further assume normality, this will yield a way to get an interval estimate of \(\rho_x(h)\) at any lag.

Example ACF of recruitment data.

acf(rec)

Could this be a white noise series? Blue lines represent \(1/\sqrt{n}\). If this were a white noise series, the ACF would fall within the blue lines

Recall, for white noise \[ \rho(h) = \begin{cases} \begin{array}[lcl] \\ 1, && h=0 \\ 0, && \text{elsewhere} \end{array} \end{cases} \] (Double-check above - may be a typo)

If there are significant lags (other than zero), this suggests the series is not white noise, and that there is more information to get from the series.

What the ACF reveals for white noise:

w <- rnorm(5000)
acf(w)

This is what we would want if we were to plot the residuals of a time-series model.

Large-Sample Distribution of the ACF
For large \(n\) and under mild conditions, the sample ACF is approximately normal with mean zero and standard deviation \(n^{-1/2}\).

So approximately 95% of the sample ACFs should be within \(\pm 2\frac{1}{\sqrt{n}}\) of zero. This can be used to assess the whiteness of a series.

Example (Moving Average) Consider the previous moving average model \[ v_t = \frac{1}{3} (w_{t-1} + w_t + w_{t+1}) \] and two realizations of it based on different sample sizes.

set.seed(4866)
w1 <- rnorm(12)
w2 <- rnorm(102)
v1 <- filter(x=w1, sides=2, rep(1/3, 3))[-c(1, 12)] # eliminate first and last observations
v2 <- filter(x=w2, sides=2, rep(1/3, 3))[-c(1, 102)]
plot.ts(v1)

plot.ts(v2)

acf(v1, lag.max=4, plot=FALSE)
## 
## Autocorrelations of series 'v1', by lag
## 
##      0      1      2      3      4 
##  1.000  0.611  0.386 -0.114 -0.262
acf(v2, lag.max=4, plot=FALSE)
## 
## Autocorrelations of series 'v2', by lag
## 
##      0      1      2      3      4 
##  1.000  0.615  0.229 -0.155 -0.167

Recall, the actual values are \(1, 2/3, 1/3, 0 , 0, \dots\) (see earlier example). Compare that to our observations above.

par(mfrow = c(2,1))
acf(v1, lag.max=4, plot=TRUE)
acf(v2, lag.max=4, plot=TRUE)

We may think that v1 is white noise, but our small sample size is the reason that the ACFs fall within the blue lines.

Problem Let \(x_t=0.9w_t - 0.1w_{t-1}\) where \(w_t\) is the usual white noise process with \(\sigma_w^2=1\). Find the true ACF, generate a random realization (\(n=500\)), and find the sample ACF.
Solution Theoretical ACF \[ \gamma(h) = \begin{cases} \begin{array}[lcl] \\ (0.81 + 0.01)\sigma_w^2, && h=0 \\ -0.09 \sigma^2_w, && |h|=1 \\ 0, && |h|>1 \end{array} \end{cases} \]

\[ \rho(h) = \begin{cases} \begin{array}[lcl] \\ 1, && h=0 \\ -0.1098, && |h|=1 \\ 0, && |h|>1 \end{array} \end{cases} \]

set.seed(123)
w <- rnorm(500)
x <- filter(w, sides=1, filter=c(0.9, -0.1))[-1] # eliminate first observation
plot.ts(x)

acf(x)

CONTINUE PROOFREADING FROM HERE

Examine actual values

acf(x, plot=FALSE)
## 
## Autocorrelations of series 'x', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000 -0.156 -0.048  0.015 -0.054 -0.002 -0.046  0.009  0.030  0.025 
##     10     11     12     13     14     15     16     17     18     19 
##  0.005  0.016 -0.106  0.004  0.023  0.082 -0.060  0.010 -0.053  0.015 
##     20     21     22     23     24     25     26 
## -0.005 -0.045  0.080  0.006 -0.008  0.012  0.000

We see for lag 1 (=-0.156), where calculated \(\rho(h)=-0.1098\).

Recall, the cross-correlation function, defined as \[ \rho_{xy}(h) = \frac{\gamma_{xy}(h)}{\sqrt{\gamma_x(0)\gamma_y(0)}} \] is a measure of the correlation between two series separated in time by lag \(h\).

Since this is usually unknown, we can use the sample cross-correlation function \[ \hat{\rho}_{xy}(h) = \frac{\hat{\rho}_{xy}(h)}{\sqrt{\hat{\gamma}_x(0)\hat{\gamma}_y(0)}} \] where \(\hat{\gamma}_{xy} = \frac{1}{n} \sum^{n-h}_{t=1} (x_{t+h} - \overline{x})(y_{t+h} - \overline{y})\).

Under the hypothesis that there is no relation at time \(h\) and that at least one of the two series is independent and identically distributed, the distribution of \(\hat{\rho}_{xy}(0)\) is approximately normal with mean zero and standard deviation \(\frac{1}{\sqrt{n}}\).

Partial Autocorrelation Function (PACF)

One can think of the PACF as the simple correlation between two points separated by a lag \(h\), say \(x_t\) and \(x_{t-h}\), with the effect of the intervening points \(x_{t-1}, x_{t-2} \dots, x_{t-h+1}\) conditional out.

This is analogous to finding the partial correlation in regression between the response variable and one of the independent variables conditioning out the other independent variables (e.g. \(r_{3|21}\))

Suppose we want to predict \(x_t\) \(x_{t-1}, \dots, x_{t-h}\) using some linear function. Consider minimizing the mean square prediction error \[ MSE = E[(x_t - \hat{x}_t)^2] \] using the predictor \[ \hat{x}_t = a_1 x_{t-1} + a_2 x{t-2} + \cdots + a_h x_{t-h} \] assuming, for convenience that \(x_t\) has been adjusted to have mean zero.

Then the PACF is defined as the value of the last coefficient \(a_h\), i.e. \(\Phi_{hh}=a_h\).

The coefficients are between -1 and 1.

The standard error of the estimated PACF is \(\frac{1}{\sqrt{n}}\).

Note: the order of the autoregressive model will be exactly the lag \(h\) beyond which \(\Phi_{hh}=0\).

Example (SOI cont’d) Note that the PACF of the SOI has a single peak at lag \(h=1\) and then relatively small values.

This means, in effect, that fairly good predictions can be achieved by using the immediately preceeding point and that adding further values in the past does not really improve this predictability.

par(mfrow=c(2, 1))
acf(soi)
pacf(soi)

The data has been readjusted as scale because the data is monthly. Lag 1.0 is one season (12 months), which why we see 12 lags by 1.0.

Example (Recruitment series)

acf(rec)

pacf(rec)

par(mfrow=c(1,1))

We see two spikes. The previous month, and the previous 2nd month, includes information that could help predict future data.

(Aside: the PACF of white noise would not have any spikes - nearly all would fall within the blue lines)

Note: The ACF of a nonstationary time series decays very slowly as a function of lag. The PACF of a nonstationary time series tends to have a peak very near unity at lag 1, with other values less than the significance level.

Most series are not stationary to begin with. In such cases, we must modify the series to improve the approximation to stationarity by detrending, differencing, transformations, and linear filtering.

April 13, 2016

Time Series Regression

Here, we relate the dependent variable \(y_t\), to functions of time.

Most useful when the coefficients of trend and seasonality are not functions of time.

Trend Model
The simplest model where there is only a trend component to the series. The general polynomial model is \[ y_t = \beta_0 + \beta_1 t + \beta_2t^2 + \cdots + \beta_p t^p \] We still require the usual assumptions on \(w_t\), where \(w_t \stackrel{iid}{\sim} N(0, \sigma^2_w)\).

We can use R to get an estimated model and prediction intervals.

Example (gtemp) Polynomial Fit. The poly() function generates the polynomial of the degree specified and all lower levels.

ts.plot(gtemp) # reminder of what plot looks like

fit <- lm(formula=gtemp ~ poly(x=time(x=gtemp), degree=3, raw=TRUE), na.action=NULL ) #raw demeans predictor variable
plot(resid(fit))

acf(resid(fit))  # exceed boundary lines indicates correlation - most likely that more than white noise is included in residuals

Using some information criteria, we can try and find on appropriate model.

AIC(fit) # unscaled AIC
## [1] -207.1945
AIC(fit)/length(gtemp) - log(2*pi) # true AIC but not necessary to use b/c comparisons are relative
## [1] -3.431681
AIC(fit, k=log(length(gtemp)))/length(gtemp) - log(2*pi) # BIC
## [1] -3.321391
(AICc <- log(sum(resid(fit)^2)/length(gtemp)) + (length(gtemp)+4)/length(gtemp) - 4-2)  # AICc, may have mistyped
## [1] -9.477835

Seasonal Variation
We can look at a couple of variations, constant or increasing.

Dummy Variables
- Used in regression to model categorical variables.
- Recall, if you have \(L\) factor levels, then we need \(L-1\) dummy variables.

The full model becomes \[ y_t = \beta_0 + \beta_1 t + \cdots + \beta_pt^p +\beta_{s1}x_{s1,t} + \beta_{s2}x_{s2,t} + \cdots + \beta_{s(L-1)}x_{s(L-1),t} + w_t \] The \(x\) valuse are 0 or 1, which tells us which seasonal component we are in at time \(t\), e.g. July.

Example (SOI)

month <- rep(1:12, 38)[-c(454:456)] # don't have data for last 3 months (1987)
dummies <- table(1:length(month), as.factor(month))
summary(lm(soi ~ time(soi) + dummies[,1:11])) # December is baseline
## 
## Call:
## lm(formula = soi ~ time(soi) + dummies[, 1:11])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.86157 -0.20401 -0.01132  0.22051  0.84104 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       13.490139   2.492629   5.412 1.03e-07 ***
## time(soi)         -0.006693   0.001266  -5.288 1.96e-07 ***
## dummies[, 1:11]1   0.041992   0.067797   0.619    0.536    
## dummies[, 1:11]2   0.094418   0.067796   1.393    0.164    
## dummies[, 1:11]3  -0.053419   0.067795  -0.788    0.431    
## dummies[, 1:11]4  -0.413335   0.067795  -6.097 2.37e-09 ***
## dummies[, 1:11]5  -0.593830   0.067795  -8.759  < 2e-16 ***
## dummies[, 1:11]6  -0.474536   0.067795  -7.000 9.64e-12 ***
## dummies[, 1:11]7  -0.534978   0.067795  -7.891 2.40e-14 ***
## dummies[, 1:11]8  -0.405552   0.067795  -5.982 4.57e-09 ***
## dummies[, 1:11]9  -0.315547   0.067795  -4.654 4.31e-06 ***
## dummies[, 1:11]10 -0.105386   0.068245  -1.544    0.123    
## dummies[, 1:11]11 -0.019179   0.068245  -0.281    0.779    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2935 on 440 degrees of freedom
## Multiple R-squared:  0.4275, Adjusted R-squared:  0.4118 
## F-statistic: 27.38 on 12 and 440 DF,  p-value: < 2.2e-16

p-values for Jan, Feb, Mar, Oct, Nov are insignificant from Dec. Not surprising since they are all winter months.

ts.plot(soi)

library(MASS)

fit.full <- lm(soi ~time(soi) + dummies[, 1:11])

stepAIC(fit.full, direction='both')
## Start:  AIC=-1097.73
## soi ~ time(soi) + dummies[, 1:11]
## 
##                   Df Sum of Sq    RSS      AIC
## <none>                         37.911 -1097.73
## - time(soi)        1    2.4091 40.320 -1071.82
## - dummies[, 1:11] 11   25.7286 63.640  -885.08
## 
## Call:
## lm(formula = soi ~ time(soi) + dummies[, 1:11])
## 
## Coefficients:
##       (Intercept)          time(soi)   dummies[, 1:11]1  
##         13.490139          -0.006693           0.041992  
##  dummies[, 1:11]2   dummies[, 1:11]3   dummies[, 1:11]4  
##          0.094418          -0.053419          -0.413335  
##  dummies[, 1:11]5   dummies[, 1:11]6   dummies[, 1:11]7  
##         -0.593830          -0.474536          -0.534978  
##  dummies[, 1:11]8   dummies[, 1:11]9  dummies[, 1:11]10  
##         -0.405552          -0.315547          -0.105386  
## dummies[, 1:11]11  
##         -0.019179

Output is unexpected. Professor can’t explain what’s wrong.

Creating model by selectively choosing months:

fit2 <- lm(soi ~ time(soi) + dummies[,c(2,4,5,6,7,8,9,10)])
summary(fit2)
## 
## Call:
## lm(formula = soi ~ time(soi) + dummies[, c(2, 4, 5, 6, 7, 8, 
##     9, 10)])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.86147 -0.20390 -0.01137  0.22124  0.84107 
## 
## Coefficients:
##                                          Estimate Std. Error t value
## (Intercept)                             13.494042   2.489396   5.421
## time(soi)                               -0.006699   0.001264  -5.298
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]2   0.102043   0.053254   1.916
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]4  -0.405709   0.053254  -7.618
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]5  -0.586204   0.053254 -11.008
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]6  -0.466909   0.053255  -8.767
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]7  -0.527350   0.053255  -9.902
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]8  -0.397924   0.053256  -7.472
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]9  -0.307918   0.053257  -5.782
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]10 -0.097760   0.053825  -1.816
##                                         Pr(>|t|)    
## (Intercept)                             9.78e-08 ***
## time(soi)                               1.85e-07 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]2     0.056 .  
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]4  1.57e-13 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]5   < 2e-16 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]6   < 2e-16 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]7   < 2e-16 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]8  4.25e-13 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]9  1.40e-08 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]10    0.070 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2932 on 443 degrees of freedom
## Multiple R-squared:  0.4247, Adjusted R-squared:  0.413 
## F-statistic: 36.34 on 9 and 443 DF,  p-value: < 2.2e-16
par(mfrow=c(2,1))
plot(time(soi), fitted(fit2), col='red', type='l') # fitted model
#lines(soi) # actual observations, too confusing when superimposed
plot(soi) # actual observations, separate plot

par(mfrow=c(1,1))
acf(resid(fit2))

ACF of residuals is clearly not only white noise which means we should be able to find a better model.

Example Consider the bike data which represents four years of quarterly mountain bike sales. Fit a time series regression model and forecast out one year.

quarter <- rep(1:4, 4)
bike <- ts(c(10,31,43,16,11,33,45,17,13,34,48,19,15,37,51,21), frequency=4, start=c(2007, 1)) #starts 1st qtr 2007
ts.plot(bike)

There is a seasonal component. Many more bikes are sold in the summer than the winter.

(Data is obviously fake; real data would never exhibit such a clear pattern.)

fit.bike <- lm(bike ~ time(bike) + factor(quarter))
summary(fit.bike)
## 
## Call:
## lm(formula = bike ~ time(bike) + factor(quarter))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -0.75  -0.25  -0.25   0.25   1.25 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -4004.7500   302.7930 -13.226 4.25e-08 ***
## time(bike)           2.0000     0.1508  13.266 4.12e-08 ***
## factor(quarter)2    21.0000     0.4782  43.913 1.04e-13 ***
## factor(quarter)3    33.5000     0.4827  69.408 6.89e-16 ***
## factor(quarter)4     4.5000     0.4900   9.184 1.72e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6742 on 11 degrees of freedom
## Multiple R-squared:  0.9983, Adjusted R-squared:  0.9977 
## F-statistic:  1645 on 4 and 11 DF,  p-value: 3.439e-15

The p-values are all highly significant. There’s no need for model selection.

acf(resid(fit.bike))

Finally, a good model. The ACF of residuals is white noise.

Projecting into the future.

pred <- c(fitted(fit.bike), -4004.75+2*c(2011, 2011.25, 2011.5, 2011.75) + c(0,21,33.5,4.5)) # fitted values plus 2011 projection
plot(x=c(time(bike), 2011, 2011.25, 2011.5, 2011.75), y=pred, col='red', type='l', xlab='') 
lines(bike, type='o') # observed values

April 18, 2016

Differencing

A common method for acheiving stationarity in nonstationarity cases is with first difference \[ \nabla y_t = y_t - y_{t-1} \] This method of transforming to stationarity is particularly useful for time series with trend.

Consider the previous example \[ y_t = \beta_0 + \beta_1 t + w_t \] Here, \[ \begin{align*} \nabla y_t &= \beta_0 + \beta_1 t + w_t - [\beta_0 + \beta_1(t-1) + w_{t-1}] \\ &= \beta_1 + w_t - w_{t-1} \end{align*} \] Is \(\nabla y_t\) stationary? \(E(\nabla y_t) = \beta_1\), therefore not a function of time.

\[ \gamma_{\nabla y}(h) = \begin{array}[lcl] \\ \begin{cases} 2 \sigma^2_w, && h=0 \\ - \sigma^2_w, && |h|=1 \\ 0, && |h| > 1 \end{cases} \end{array} \] It’s only a function of lag \(h\), so \(\nabla y_t\) is stationary.

Example Regression vs. differencing on gtemp

plot.ts(gtemp) # reminder of plot

par(mfrow=c(2,1))
fit <- lm(gtemp ~ time(gtemp), na.action=NULL)
plot(resid(fit)) # did this on Wednesday
plot(diff(gtemp))

par(mfrow=c(3,1))
acf(gtemp, 48, main='gtemp')
acf(resid(fit), 48, main='detrended')
acf(diff(gtemp), 48, main='first difference')

Higher order differences are defined as successive applications of the operator \(\nabla\).

For example, the second order difference is \[ \begin{align*} \nabla^2 y_t &= \nabla(y_t - y_{t-1}) \\ &= y_t - y_{t-1} - (y_{t-1} y_{t-2}) \\ &= y_t - 2y_{t-1} + y_{t-2} \end{align*} \] (it’s the difference of the difference) If the model also contained a quadratic trend term, we can show that taking the second order difference reduces the model to have a mean function that is independent of time (possibly stationary).

Problem Let \(y_t = \beta_0 + \beta_1 t + \beta_2 t^2 + w_t\).
Determine if \(v_t = \nabla^2 y_t\) is stationary.

Solution
\[ \begin{align*} v_t &= y_t - 2y_{t-1} + y_{t-2} \\ &= \beta_0 + \beta_1 t +\beta_2 t^2 + w_t - 2[\beta_0 + \beta_1(t-1) + \beta_2(t-1)^2 + w_{t-1}] + \beta_0 + \beta_1(t-2) + \beta_2(t-2)^2 + w_{t-2} \\ &\vdots \\ &= 2 \beta_2 + w_t - w_{t-1} + w_{t-2} \end{align*} \] \(E(v_t) 2 \beta_2\) independent of time.

\[ \gamma_v(h) = \begin{array}[lcl] \\ \begin{cases} 6 \sigma^2_w, && h = 0 \\ -4 \sigma^2_w, && |h| = 1 \\ \sigma^2_w, && |h| = 2 \\ 0, && |h| > 2 \end{cases} \end{array} \]

Transformations

A transformation that cuts down the values of larger peaks of time series and emphasizes lower values may be usedful in reducing nontationarity due to changing variance.

An example is the log transformation, \(y_t = \log x_t\), usually natural log.

Example (varve - glacial thickness)

par(mfrow=c(3,1))
plot(varve)
plot(log(varve))
plot(diff(log(varve), differences=1))

par(mfrow=c(1,1))

Another transformation that is often useful in count data is the square root transformation.

More general transformations that are available fall within the power of Box-Cox family defined in terms of \(x_t^{\alpha}\) for some \(\alpha\) to be estimated.

library(MASS)
boxcox((diff(varve) - min(diff(varve)) + 2) ~ time(varve)[2:634])

plot(diff(varve))

Linear Filters

We may define general linear filters to do kinds of smoothing and roughening of a time series to enhance signals and attenuate noise. Consider the general linear combination of past and future values \[ y_t = \sum_{j=-\infty}^{\infty} a_j x_{t-j} \] An example is the first difference where \(a_0=1, a_1=1,\) and \(a_j=0\) otherwise.

Example (Mortality Rates) Average weekly cardiovascular mortality in LA County 1970-1979.

ma5 <- filter(cmort, sides=2, rep(1,5)/5) # 5th order moving average
ma53 <- filter(cmort, sides=2, rep(1, 53)/53)
plot.ts(cmort)
lines(ma5, col='red')
lines(ma53, col='blue') # blue line is truncated b/c losing 26 data points at each end - required to create moving average

plot.ts(ma5) # helps to remove noise, but not so much as to lose the signal.

week <- rep(1:52, 10)[-c(509:520)]

ma7 <- filter(cmort, sides=2, rep(1, 7)/7)
fit7 <- lm(ma7 ~ time(cmort) + as.factor(week))
plot(time(cmort)[-c(1:3, 506:508)], fitted(fit7), type='l') # fitted values according to time series model
lines(ma7, col='red') #7th order moving avg with filter applied

acf(resid(fit7)) # acf of residuals

Fit looked pretty good, but clearly haven’t included all important variables.

(differencing gets rid of trends that aren’t constant)

April 20, 2016

Other Smoothing Techniques

The reality of some time series data is that the behavior may change over time.

Thus using one static smoother across the entire series may not be a good idea.

The following methods use local smoothing resulting in a new series \(\hat{f}_t\).

Technique 1: Kernel Smoother
This is a weighted smoothing function (usually symmetric). \[ \hat{f}_t = \sum_{i=1}^n w_i x_i \] where \[ w_i(t)= \frac{k\left(\frac{t-i}{b}\right)}{\sum_{j=1}^n K\left(\frac{t-j}{b}\right)} \] By far, the most commonly used kernel is the Gaussian Kernel.

Example (cmort)

plot.ts(cmort)
lines(ksmooth(time(cmort), cmort, 'normal', bandwidth=8/52), col='red') # 8 week bandwidth

Techniques 2 and 3: Lowess and Nearest Neighbor This approach to smoothing a time series uses nearest neighbor regression.

The idea is to use the data \[ \{x_{t-k/2} , \dots , x_t, \dots, x_{t + k/2} \} \] to predict \(x_t\) using linear regresion, yielding \(\hat{f}_t\)

Example (cmort)

s1 <- supsmu(time(cmort), cmort, span=0.01) # seasonal component
plot(cmort, type='p')
lines(s1, col='red')

plot(cmort - s1$y) # residuals after fit

acf(cmort - s1$y)

More work is needed to isolate the noise.

Lowess does something very similar. A certain proportion of nearest neighbors to \(x_t\) are included in the weighting scheme, points closer to \(x_t\) get more weight.

Then, a robust weighted regression is used to smooth the series.

Example (cmort)

plot(cmort, type='p') # again, the plot
lines(lowess(cmort, f=0.02), col='red') # use an `f` as you see appropriate

plot(cmort, type='p') # again, the plot
lines(lowess(cmort, f=0.04), col='red') # results in smoother line

Technique 4: Spline
Spline is essentially polynomial regression done on partitions of the data, called knots. A popular choice is to use third-order polynomials within each knot, called cubic splines.

Example (cmort)

plot(cmort, type='p')
lines(smooth.spline(time(cmort), cmort), col='red')

Technique 5: Exponential Smoothing (to be covered for the remainder of the class)
This provides a forecasting method that is most effective when the trend and seasonal factor may be changing over time.

Simple Exponential Smoothing
Is used for forecasting when there is no trend or seasonal pattern.

In general, we start with an initial estimate \(\ell_0\) by averaging the first \(k\) observations (equal weighting). So, if at time \(t-1\), we have an estimate of \(\ell_{t-1}\), we get our estimate of \(\ell_t\) at time \(t\) using the smoothing equation \[ \ell(t) = \alpha y_t + (1 - \alpha) \ell_{t-1} \] where \(\alpha \in (0,1)\) is a weight.

If \(\alpha\) is big, then the mean function is changing rapidly where the current observation is weighted much more than previous observations.

We select \(\alpha\) by minimizing the sum of squared error (SSE).

Example (soi)

hw <- HoltWinters(soi, gamma=0, beta=0)
plot(hw)

hw$alpha
## [1] 0.3793885
plot(hw$fitted)

To calculate a prediction interval for \(y_{t + \tau}\), based on observations up to and including time \(t\) is \[ \ell_t \pm z_{\alpha/2} s \sqrt{1 + (\tau - 1)\alpha^2} \] where \(s=\sqrt{\frac{SSE}{t-1}}=\sqrt{(\sum_{j=1}^t (y_t - \ell_{j-1})^2)/(t-1)}\)

s <- sqrt(hw$SSE/(length(soi) - 1)) # not necessary, but it's how it's manually calculated
s
## [1] 0.3950483
p <- predict(hw, 50, prediction.interval=TRUE)

plot(hw, p)

# red lines on the right is l_t, l_t+1, ... 50 pts out
# blue lines is the prediction interval
#   the farther out you go, the wider the interval

Holt’s Exponential Smoothing
This is essentially simple expnential smoothing that allows for a changing trend (but still no seasonal factor).

We let \(b_{t-1}\) be the estimated growth rate at time \(t-1\).

The estimate at the level at time \(t\) is \[ \ell_t = \alpha y_t + (1- \alpha)(\ell_{t-1} + b_{t-1}) \] We estimate the growth rate in time period \(t\) by \[ b_t = \gamma (\ell_t - \ell_{t-1}) + (1- \gamma)b_{t-1} \] where \(\gamma \in (0,1)\) is another smoothing parameter.

# in book, gamma refers to growth
# in R, gamma refers to seasonal component
# in book, Kerr thinks seasonal is delta
hw <- HoltWinters(soi, gamma=0)
p <- predict(hw, 50, prediction.interval=TRUE)
plot(hw, p)

#hw$gamma
#hw$beta

Holt-Winters Seasonal Methods
Now acounting for seasonality.

There are two types: additive and multiplicative.

Additive: the model is \[ y_t = \beta_0 + \beta_1 t + SN_t + \varepsilon_t \] we have \[ \begin{align*} \ell_t &= \alpha(y_t - sn_{t-1}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ b_t &= \gamma(\ell_t - \ell_{t-1}) + (1 - \gamma)b_{t-1} \\ sn_t &= \delta(y_t - \ell_t) + (1-\delta)sn_{t-1} \end{align*} \]

hw <- HoltWinters(soi)
p <- predict(hw, 50, prediction.interval=TRUE)
plot(hw, p)

#hw$gamma # not zero means it's picking up a trend
#hw$beta  # zero, means no growth

Multiplicative: Similar to the additive, only now the seasonal component is dependent on the trend.

hw <- HoltWinters(soi, seasonal='mult')
hw$gamma
##     gamma 
## 0.9353537
hw$beta
##         beta 
## 9.795319e-18
hw$alpha
##       alpha 
## 7.08214e-07
p <- predict(hw, 50, prediction.interval=TRUE)
plot(hw, p)

This looks bad. Clearly, multiplicative method is not appropriate here.

The best that we examined today (additive), with a 1-year forecast smoothing model, is

hw <- HoltWinters(soi)
p <- predict(hw, 12, prediction.interval=TRUE)
plot(hw, p)

This ends the material in Chapter 2.

April 25, 2016

Box-Jenkins Models

Classical regression is often insufficient for getting all the information out of a time series.

For example, when we plotted the ACF of the residuals for any of the previous regression problems, we never ended up with something that looked like white noise.

Another approach to modelling a time series is to use the lagged regressions of an observation with other observations (or noise), motivating the use of autoregressive (AR), moving average (MA), or both simulataneously to form autoregressive moving average (ARMA) models.

Additionally, we will discuss extensions for non-stationary time series using autoregressive integrated moving average (ARIMA) models and its extension, accounting for seasonality, seasonal autoregressive integrated moving average (SARIMA) models.

Autoregressive Models

The general model for AR(\(p\)) is written as \[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \cdots + \phi_p x_{t-p} + w_t \] Therefore, the current value is expressed as a linear combination of the past \(p\) values of the series with an additive noise component.

(This is similar to a linear regression model except with \(\phi_i\) instead of \(\beta_i\).)

It can be shown that for this model,

Example

set.seed(123)
ar2 <- arima.sim(model=list(order(2,0,0), ar=c(0.5, 0.2)), n=200)
plot(ar2)

acf(ar2)  # sample acf

pacf(ar2) # sample pacf

The mean function of the above example is zero. If the mean of \(x_t\) is not zero, replace \(x_t\) with \(x_t - \mu\) so the model becomes \[ x_t = \delta + \phi_1 x_{t-1} + \phi_2 x_{t-2} + \cdots + \phi_p x_{t-p} + w_t \] where \(\delta = \mu(1 - \phi_1 - \phi_2 - \cdots - \phi_p)\).

Definitions Let the backwards operator \(B\), shift the series back one time unit, i.e. \[ B x_t = x_{t-1} \] we can express the previous zero mean AR(2) model as \[ (1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p) x_t = w_t \] or even more concisely as \[ \phi (B) x_t = w_t \] where \(\phi (B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p\) is a polynomial with roots (solutions for \(\phi(B)=0\)) outside the unit circle.

The restriction is necessary to express the solution of \(x_t\) in terms of present and past values of \(w_t\).

The solution has the form \[ x_t = \psi(B) w_t \] where \(\psi(B) = \sum_{j=0}^{\infty} \psi_j B^j\) is an infinite polynomial with coefficients determined by equating coefficients of \(B\) in \[ \psi(B) \phi(B)=1 \] Here, \(\psi_0 = 1\).

Example AR(1). Suppose we have the model \[ x_t = \phi_1 x_{t-1} + w_t \ \Longrightarrow \ x_t - \phi_1 x_{t-1} = (1 - \phi_1 B)x_t = w_t \] To solve for \(x_t\), we use the previous result to get \[ (1 + \psi_1 B + \psi_2 B^2 + \cdots)(1 - \phi_1 B) = 1 \] Equating coefficients of \(B\) to zero, we have \[ \psi_1 - \phi_1 \ \Leftrightarrow \ \psi_1 = \phi_1 \] For \(B^2\), we would get \[ \psi_2 - \phi_1 \psi_1 = 0 \ \Leftrightarrow \ \psi_2 = \phi_1^2 \] Continuing, we obtain \(\psi_3 = \phi_1^j\) and the representation is \[ \psi(B) = 1 + \sum_{j=1}^B \phi_1^j B^j \] and we have \[ x_t = \sum_{j=0}^{\infty} \phi_1^j w_{t-j} \] Alternatively, for the AR(1), iterating it backwards \(k\) times, we get \[ \begin{align*} x_t = \phi_1 x_{t-1} + w_t &= \phi_1(\phi_1 x_{t-2} + w_{t-1}) + w_t \\ &= \phi_1^2 x_{t-2} + \phi_1w_{t-1} + w_t \\ &\vdots \\ &= \phi_1^k x_{t-k} + \sum_{j=0}^{k-1} \phi_1^j w_{t-j} \end{align*} \] This suggests that, by continuing to iterate backwards and provided \(|\phi| < 1\) and \(x_t\) is stationary, we can represnt this model as \[ x_t = \sum_{j=0}^{\infty} \phi_1^j w_{t-j} \] Is the AR(1) process stationary? \[ E(X_t) = \sum_{j=0}^{\infty} \phi_1^j E(w_{t-j}) = 0 \] and the autocovariance function is \[ \begin{align*} \gamma_X(h) &= E\left[\left(\sum_{j=0}^{\infty} \phi_1^j w_{t+h -j} - 0 \right) \left(\sum_{k=0}^{\infty} \phi_1^k w_{t-k} - 0 \right) \right] \\ &= \sigma_w^2 \sum_{j=0}^{\infty} \phi_1^j \phi_1^{j+h} \\ &= \sigma_w^2 \phi_1^h \sum_{j=0}^{\infty} \phi_i^{2j} \\ &= \frac{\sigma_w^2 \phi_1^h}{1 - \phi_1^2}, \ \ \ h \geq 0 \end{align*} \] Since \(\mu_x\) and \(\gamma_x(h)\) are not functions of time, \(x_t\) is stationary.

We can use the symmetry of the autocovariance function for the case \(h < 0\).

Thus, the ACF on an AR(1) process is \[ \rho_x(h) = \frac{\gamma_x(h)}{\gamma_x(0)} = \phi_1^h \] It is interesting to note that the ACF can be solved iteratively since \(\rho_x)h) = \phi_1 \rho_x (h - 1)\)

Example

par(mfrow=c(2, 1))
pos <- arima.sim(list(order=c(1, 0, 0), ar=0.9), n=100)
neg <- arima.sim(list(order=c(1, 0, 0), ar=-0.9), n=100)
                 
plot(pos, main=expression(AR(1)~~phi==.9)) # pos correlation
plot(neg, main=expression(AR(1)~~phi==-.9)) # neg corr

acf(pos)
acf(neg)

pacf(pos)
pacf(neg)

par(mfrow=c(1, 1))

How would I then forecast, say \(x_{t+1}\)?

\(x_{t+1} = \phi_1 x_t + w_{t+1}\)

April 27, 2016

Causality

An AR process is said to be causal if it does not depend on the future, such as the AR(1) when \(|\phi| < 1\).

Clearly, if \(|\phi| \geq 1\), the previous representation does not hold.

Instead, write \(x_{t+1}=\phi x_t + w_{t+1}\), in which case, \[ \begin{align*} x_t &= \phi^{-1}x_{t+1} - \phi^{-1}w_{t+1} \\ &= \phi^{-1}(\phi^{-1}x_{t+2} - \phi^{-1}w_{t+2}) - \phi^{-1}w_{t+1} \\ &\vdots \\ &= \phi^{-k} x_{t+k} - \sum_{j=1}^{k-1} \phi^{-j} w_{t+j} \end{align*} \] Since \(|\phi |^{-1} < 1\), this result suggests the stationary future-dependent AR(1) model \[ x_t = - \sum_{j=1}^{\infty} \phi^{-1} \underbrace{w_{t+j}}_{future} \] is useless.

Making It Causal
Suppose we have the previous non-causal stationary Gaussian process with zero mean and \[ \begin{align*} \gamma_x(h) &= \text{Cov}(x_{t+h}, x_t) \\ &= \text{Cov} \left(- \sum_{j=1}^{\infty} \phi^{-j} w_{t+h+j} , - \sum_{k=1}^{\infty} \phi^{-k} w_{t+k} \right) \\ &= \sigma_w^2 \phi^{-2} \frac{\phi^{-h}}{(1- \phi^{-2})} \end{align*} \] Consider the causal process defined by \[ y_t = \phi^{-1}y_{t-1} + v_t \] where \(v_t \stackrel{iid}{\sim} N(0, \underbrace{\sigma_w^2 \phi^{-2}}_{\sigma_v^2})\).

Then \(y_t\) is stochastically equal to the \(x_t\) process (all finite distributions of the processes are the same).

The Gaussian assumption applies to both and the mean and autocovariance functions are the same.

Example Let \(x_t = \underbrace{2}_{\phi} x_{t-1} + w_t\) with \(\sigma^2_w=1\). Then \(y_t = \frac{1}{2} y_{t-1} + v_t\) with \(\sigma_v^2=\sigma_w^2 \phi^{-2}=\frac{1}{4}\) is an equivalent causal process.

Moving Average Models

The general model of an MA(q) is written as \[ x_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \cdots+ \theta_q w_{t-q} \] Here, the white noise terms on the right-hand side are combined linearly to form the observed data.

It can be shown for this model that

We can write the model as \[ x_t = \theta(B)w_t \] where \(\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q\).

Note that, unlike the AR model, the MA process is stationary for any values of the parameters \(\theta_1, \theta_2, \dots, \theta_q\).

Example Consider the MA(1) model \[ x_t = w_t + \theta w_{t-1} \] Find the autocovariance and ACF for the MA(1) process.

Solution
\[ \begin{align*} \gamma_x(h) &= \text{Cov}(x_{t+h}, x_t) \\ &= \text{Cov}(w_{t+j} + \theta w_{t+h-1}, w_t + \theta w_{t-1}) \\ &= \begin{array}[lcl] \\ \begin{cases} (1 + \theta^2)\sigma_w^2, && h=0 \\ \theta \sigma_w^2, && |h|=1 \\ 0, && |h| > 1 \end{cases} \end{array} \end{align*} \] Therefore, \[ \rho_x(h) = \begin{array}[lcl] \\ \begin{cases} 1, && h=0 \\ \frac{\theta}{1 + \theta^2}, && |h|=1 \\ 0, && |h| > 1 \end{cases} \end{array} \]

ma1 <- arima.sim(list(order=c(0,0,1), ma=2), n=500)
plot(ma1, ylab='x', main=(expression(MA(1)~~theta==2)) )

acf(ma1)

pacf(ma1)

Invertibility

This is essentially the same as causality, only switching the roles.

In other words, put \(w_t\) in terms of the \(x_t\)’s and make sure the roots are outsid the unit circle.

Example Note that for an MA(1) model, \(p(h)\) is the same for \(\theta\) and \(\frac{1}{\theta}\). In addition, the pair \(\sigma_w^2=1\) and \(\theta=5\) yield the same autocovariance function as the pair \(\sigma_w^2=25\) and \(\sigma= \frac{1}{5}\).

Thus the MA(1) processes \[ \begin{align*} & x_t = w_t + \frac{1}{5} w_{t-1}, \ \ \ w_t \stackrel{iid}{\sim} N(0, 25) \tag*{(1)} \\ \text{and}& \\ & x_t = v_t + 5 v_{t-1}, \ \ \ v_t \stackrel{iid}{\sim} N(0, 25) \tag*{(2)} \end{align*} \] are the same because of normality.

Problem Show that the above example is true.

Solution For the MA(1) process parameter \(\frac{1}{\theta}\), the ACF when \(h=1\) is \[ \begin{align*} \rho(1) &= \frac{\frac{1}{\theta}}{\left(1 + \frac{1}{\theta^2}\right)} \\ &= \frac{\frac{1}{\theta}}{\frac{\theta^2 + 1}{\theta^2}} \\ &= \frac{\theta}{1 + \theta^2} \end{align*} \] (1) \[ \begin{align*} \gamma(0) &= (1 + \theta^2)\sigma^2_w = \left(1 + \frac{1}{25} \right)25 = 26 \\ \gamma(1) &= \theta \sigma^2_w = \frac{1}{5} 25 = 5 \end{align*} \] (2) \[ \begin{align*} \gamma(0) &= ( 1 + 5^2)1 = 26 \\ \gamma(1) &= 5 \cdot 1 = 5 \end{align*} \]

ma1 <- arima.sim(list(order=c(0,0,1), ma=0.2, sd=5), n=200)
ma1.2 <- arima.sim(list(order=c(0,0,1), ma=5, sd=1), n=200)
plot(ma1, ylab='x', main=(expression(MA(1)~~theta==0.2)) )

plot(ma1.2, ylab='x', main=(expression(MA(1)~~theta==5)) )

Since we only observe \(x_t\), and not \(w_t\), we need a way to decide which of these we should use. Only one of these is invertible.

We start by trying to write them in terms of an infinite AR process (reversing the roles of \(x_t\) and \(w_t\)).

Following the steps to show causality, if \(|\theta| < 1\), then \(w_t = \sum_{j=0}^{\infty} (- \theta)^j x_{t-j}\), which is the desired infinite AR representation of the model.

Hence, given the choice, we will choose the model with \(\theta = \frac{1}{5}\) and \(\sigma_w^2 = 25\) because it is invertible.

Autoregressive Regressive Moving Average Models ARMA \((p, q)\)

The general form of the ARMA \((p, q)\) model is written as \[ x_t - \phi_1 x_{t-1} - \phi_2 x_{t-2} - \cdots - \phi_p x_{t-p} = w_t + \theta_1 w_{t-1} + \cdots + \theta_q w_{t-q} \] where \(\phi_p \neq 0\) and \(\theta_q \neq 0\), with \(\sigma_w^2 > 0\).

The ARMA model can be viewed as a regression on past values with correlated errors. That is, \[ x_t = \beta_0 + \beta_1 x_{t-1} + \cdots + \beta_p x_{t-p} + \varepsilon_t \] where \(\varepsilon_t = w_t + \theta_1 w_{t-1} + \cdots + \theta_q w_{t-q}\) where we use \(\phi\)’s instead of \(\beta\)’s.

May 4, 2016

Parameter Redundancy

Consider a white noise process \(x_t = x_t\).

Equivalently, we can write this as \(0.5 x_{t-1} = 0.5 w_{t-1}\) by shifting back one unit of time and multiplying by \(0.5\).

Now, subtracting the two representations to obtain \[ \begin{align*} x_t -0.5 x_{t-1} &= w_t - 0.5 w_{t-1} \\ x_t &= 0.5 x_{t-1} - 0.5 w_{t-1} + w_t \end{align*} \] which looks like an ARMA(1, 1) model.

If we write this in operator form, \[ \begin{align*} (1 - 0.5 B)x_t &= (1 - 0.5 B) w_t \\ x_t &= w_t \end{align*} \] It is clear that this is \(x_t = w_t\), a white noise process.

The AR and MA polynomials are defined as \[ \phi (z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p, \ \ \ \phi_p \neq 0 \] and \[ \theta (z) = 1 + \theta_1 z + \theta_2 z^2 + \cdots + \theta_q z^q, \ \ \ \theta_q \neq 0 \] respectively, where \(z\) is a complex number.

Causality
An ARMA(\(p,q\)) is causal if, and only if, the roots of \(\phi(z)\) lie outside the unit circle, that is \(\phi (z) =0\) only when \(|z| >1\).

In the AR(1) case, \(\phi(z)=1 - \phi z\) which has the root, say \(z_0\), of \(z_0 = \frac{1}{\phi}\). Notice that \(|z_0| > 1 \ \ \Longleftrightarrow \ \ |\phi | < 1\).

Invertibility
An ARMA(\(p,q\)) is invertible if and only if the roots of \(\theta(z)\) lie outside the unit circle; that is, \(\theta(z)=0\) only when \(|z|>1\).

Parameter Redundancy, Continued

Example Consider the process \[ x_t = 0.4 x_{t-1} + 0.45 x_{t-2} + w_t + w_{t-1} + 0.25 w_{t-2} \] which looks like an ARMA(2, 2). Or in operator form, \[ (1 - 0.4B - 0.45B^2) x_t = (1 + B + 0.25B^2) w_t \] The associated polynomials are \[ \begin{align*} \phi (z) &= 1 - 0.4 z - 0.45 z^2 \\ &= (1 + 0.5z)(1-0.9z) \\ &\\ \theta(z) &= 1 + z + 0.25z^2 \\ &= (1 + 0.5z)^2 \end{align*} \] After cancellation, the polynomials become \[ \begin{align*} \phi(z) &= 1 - 0.9z \\ \text{and}&\\ \theta(z) &= 1 + 0.5 z \end{align*} \] or \[ x_t = 0.9 x_{t-1} + w_t + 0.5 w_{t-1} \] It is causal since \(\phi(z) = 0 \ \Longleftrightarrow \ z = \frac{10}{9} > 1\) and the model is invertible since \(\theta(z)=0 \ \Longleftrightarrow \ z=-2 <1\), outside the unit circle.

Showing invertibility and causality for higher order models can be quite difficult.

For second order polynomials, recall the quadratic equation is \[ \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \]

Problem Consider the ARMA(2, 1) process defined by \[ x_t - 0.75x_{t-1} + 0.5625 x_{t-2} = w_t + 1.25 w_{t-1} \] Is this process causal and invertible?

Solution The AR polynomial is \[ \phi(z) = 1 - 0.75z + 0.5625z^2 \] and has roots at \(z=\frac{2}{3}(1 \pm i\sqrt{3})\), which lies outside the unit circle.

Solving in R

polyroot(c(1, -0.75, 0.5625))
## [1] 0.666667+1.154701i 0.666667-1.154701i

Therefore the process is causal.

For \(z= x + yi, |z|= \sqrt{x^2 + y^2}\).

sqrt((2/3)^2 + 1.154701^2)
## [1] 1.333334

Here both roots are such that \(|z| = \frac{4}{3} > 1\).

Integrated Models for Nonstationary Data ARIMA(p,d,q)

If \(d\) is a nonnegative integer, then \(x_t\) is an ARIMA(\(p,d,q\)) process if \(y_t = (1 - B)^d x_t\) is an ARMA(\(p,q\)) process.

Example ARIMA(1,1,0). For \(\phi = (-1, 1)\), \[ \begin{gather} (1 - \phi B) \underbrace{(1 - B) x_t}_{y_t} = w_t \\ \text{or} \ (1 - B - \phi B + \phi B^2)x_t = w_t \\ \text{or} \ x_t = (1 + \phi)x_{t-1} - \phi x_{t-2} + w_t \end{gather} \] Note: this looks like an AR(2) process when estimating.

Example Simulation

ts.sim <- arima.sim(list(order=c(1,1,0), ar=.8), n=200)  
plot.ts(ts.sim)

dif <- diff(ts.sim, 1) # difference operator of 1
plot.ts(dif)

wh <- arima(dif, order=c(1, 0, 0))  # should produce residuals of white noise
wh # estimated coefficients for ar1 process
## 
## Call:
## arima(x = dif, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.8013     0.3298
## s.e.  0.0423     0.3533
## 
## sigma^2 estimated as 1.026:  log likelihood = -286.82,  aic = 579.65
acf(wh$res)

pacf(wh$res)

# with Yule-Walker Estimation
ar.yw(dif, order=1)
## 
## Call:
## ar.yw.default(x = dif, order.max = 1)
## 
## Coefficients:
##      1  
## 0.7932  
## 
## Order selected 1  sigma^2 estimated as  1.064

The ar1 coefficient is similar in both estimates, as is often the case.

# or 
library(forecast)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.1
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:astsa':
## 
##     gas
auto.arima(ts.sim) # chooses best subset for an ideal model
## Series: ts.sim 
## ARIMA(1,1,0)                    
## 
## Coefficients:
##          ar1
##       0.8089
## s.e.  0.0415
## 
## sigma^2 estimated as 1.035:  log likelihood=-287.24
## AIC=578.48   AICc=578.54   BIC=585.08

Professor mentions ARMAacf() and recommends looking into it (apparently would have been useful for the midterm).

May 09, 2016

Forecasting

The goal is to predict future values of a time series, \(x_{n+m}, m=1,2, \dots\) based on the data collected to the present \[ \mathbf{x} = \{x_n , x_{n-1}, \dots, x_1\} \] The minimum mean square error predictor of \(x_{n+m}\) is \[ x^n_{n+m} = E(x_{n+m}| x_n, x_{n-1}, \dots, x_1) \] because the conditional expectation minimizes the mean square error \[ E[\big(x_{n+m}-g(\mathbf{x}) \big)^2] \] where \(g(\mathbf{x})\) is a function of the observations [easiest is linear], i.e. \[ x^n_{n+m} = \alpha_0 + \sum_{k=1}^n \alpha_k x_k \] There are several things required before we can do the forecasting.

First, we need \(g(\mathbf{x})\) which requires knowing the order of the ARMA(\(p,q\)) model.

Then we need to estimate the mean, coefficients, and the white noise variance.

The mean is first taken care of by subtracting the sample mean from the data.

Prediction Error

The mean square m-step-ahead prediction error is \[ p_{n+m}^n = E[(x_{n+m} - x^n_{n+m})^2] \]

Prediction Intervals

In general, \((1 - \alpha)\) prediction intervals are \[ x_{n+m}^n \pm z_{(\alpha/2)}\sqrt{p_{n+m}^n} \] we will not go into the calculation of \(x_{n+m}^n\) nor \(p_{n+m}^n\) since R will give this to us.

Estimation

When dealing with pure AR processes, we can use Yule-Walker Estimation to get parameter estimates.

Definition The Yule-Walker equations are given by \[ \begin{gather} \rho (h) = \phi_1 \rho(h-1) + \cdots + \phi_p \rho(h-p), \ \ \ h=1,2, \dots, p \\ \sigma^2_w = \gamma (0) [1 - \phi_1 \rho (1) - \cdots - \phi_p \rho(p)] \end{gather} \] The estimators obtained by replacing \(\gamma(0)\) with its estimator \(\hat{\gamma}(0)\), and \(\rho(h)\) with its estimator, \(\hat{\rho}(h)\), are called the Yule-Walker estimators.

It can be shown that the asymptotic distribution of the Yule-Walker estimators is normal and that \(\hat{\sigma}_w^2\) converges to \(\sigma_w^2\) in probability.

An alternative to the Y-W (Yule-Walker) to use the arima() function in R to get the estimates. This is a combination of maximum likelihood and conditional least squares.

Example (Recruitment Data)

acf(rec) # a reminder of the recruitment data

          # acf dies slowly, and
pacf(rec) # pacf dies after lag 2, suggesting AR(2)

rec.yw <- ar.yw(rec, order=2)
rec.yw$x.mean
## [1] 62.26278
plot.ts(rec) # reminder - not a mean zero process

rec.yw$ar #    phi1 and phi2 coefficients
## [1]  1.3315874 -0.4445447
sqrt(diag(rec.yw$asy.var.coef)) # standard errors; useful for checking for significance of coefficients
## [1] 0.04222637 0.04222637
rec.yw$var.pred # error variance estimate
## [1] 94.79912
# Predictions:
rec.pr <- predict(rec.yw, n.ahead=24)
U <- rec.pr$pred + rec.pr$se # upper bound
L <- rec.pr$pred - rec.pr$se # lower bound
month <- 360:453 # months the prediction is applied to
plot(rec, type='o', xlim=c(1985, 1991), ylab='recruits')
lines(rec.pr$pred, col='red', type='o')
lines(U, col='blue', lty='dashed')
lines(L, col='blue', lty='dashed')

We can also use mle for parameter esimation:

rec.mle <- ar.mle(rec, order=2)
rec.mle$x.mean
## [1] 62.26153
rec.mle$ar # compare with `rec.yw$ar`
## [1]  1.3512809 -0.4612736
sqrt(diag(rec.mle$asy.var.coef))  # similar to `sqrt(diag(rec.yw$asy.var.coef))`
## [1] 0.04099159 0.04099159
rec.mle$var.pred
## [1] 89.33597

In general, we can use

# arima(data, order=c(p, d, q)) # general format
rec.arima <- arima(rec, order=c(2,0,0))
rec.arima
## 
## Call:
## arima(x = rec, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1      ar2  intercept
##       1.3512  -0.4612    61.8585
## s.e.  0.0416   0.0417     4.0039
## 
## sigma^2 estimated as 89.33:  log likelihood = -1661.51,  aic = 3331.02

Quality of fit, Y-W

acf(rec.yw$resid, na.action=na.pass)

Seasonal components are still persisting in data as seen in the lag 1 and lag 2.

Example recap:

acf(rec.yw$resid, na.action=na.pass, lag.max=72)

pacf(rec.yw$resid, na.action=na.pass, lag.max=72)

Seasonal Box-Jenkins Models

The multiplicative seasonal autoregressive integrated moving average model, or SARIMA, is \[ \Phi_P (B^s) \phi (B) \nabla_s^D \nabla^d x_t = \alpha + \Theta_Q (B^s) \theta (B) w_t \] where \(w_t\) is the usual Gaussian white noise process. We can denote the general model as \[ ARIMA(p,d,q) \times (P, D, Q)_s \] Here, \(\nabla^d = (1 - B)^d\) and \(\nabla_s^D = (1-B^s)^D\).

May 11, 2016
\(ARIMA(p,d,q) \times (P,D,Q)_S\)

General Approach

  1. Differencing operators are applied first.
  2. Next, the ACF and PACF are evaluated.
  3. Fit AR and/or MA models.
  4. Continue steps 2 and 3 until residuals look like white noise.

Example \(ARMA(0,1,1) \times (0,1,1)_{12}\)
\[ \begin{align*} (1-B^{12})(1-B)x_t &= (1 + \Theta B^{12})(1 + \theta B)w_t \\ (1 - B - B^{12} + B^{13})x_t &= (1 + \theta B + \Theta B^{12} + \theta \Theta B^{13}) w_t \\ x_t &= x_{t-1} + x_{t-12} - x_{t-13} + w_t + \theta w_{t-1} + \Theta w_{t-12} + \theta \Theta w_{t-13} \end{align*} \]

Example Federal Reserve Board Production
prodn dataset

plot.ts(prodn) # seems linear

fed <- diff(prodn, 1) # try 1st order difference
plot.ts(fed)

fed <- diff(log(prodn), 1)
plot.ts(fed) # an improvement  

acf(fed)

pacf(fed)

acf(fed, lag.max=120) # increase lag to 10 yrs; seasonal component decreases over time

pacf(fed, lag.max=120) # seasonal component goes away after 3 years

logprodn <- log(prodn)
fed.fit <- arima(logprodn, order=c(0,1,0), seasonal=list(order=c(3,0,0), period=12))
plot.ts(fed.fit$res)

acf(fed.fit$res, lag.max=120)

pacf(fed.fit$res, lag.max=120) # some seasonality still exists in the first few years

# undo seasonal AR
fed.fit <- arima(logprodn, order=c(0,1,0), seasonal=list(order=c(0,1,0), period=12))
acf(fed.fit$res, lag.max=120)

pacf(fed.fit$res, lag.max=120) # opposite effects are seen, indicating an MA1 model

fed.fit <- arima(logprodn, order=c(0,1,0), seasonal=list(order=c(0,1,1), period=12))
acf(fed.fit$res, lag.max=120)

pacf(fed.fit$res, lag.max=120)

# Model that Kerr ended up with, though it clearly needs more work
# Nonseasonal 1,1,1; seasonal 2,1,1
fed.fit <- arima(prodn, order=c(1,1,1), seasonal=list(order=c(2,1,1), period=12))
acf(fed.fit$res, lag.max=120)

pacf(fed.fit$res, lag.max=120)

(This is the typical procedure: lots of trial and error.)

auto.arima(logprodn)
## Series: logprodn 
## ARIMA(2,1,0)(1,0,0)[12]                    
## 
## Coefficients:
##          ar1     ar2    sar1
##       0.2141  0.1697  0.8031
## s.e.  0.0517  0.0514  0.0301
## 
## sigma^2 estimated as 0.0003711:  log likelihood=934.11
## AIC=-1860.22   AICc=-1860.12   BIC=-1844.56
# use model as suggested by `auto.arima()`
fed.fit <- arima(logprodn, order=c(2,1,0), seasonal=list(order=c(1,0,0), period=12))
acf(fed.fit$res, lag.max=120)

pacf(fed.fit$res, lag.max=120)

Appears no better than previous models.

On untransformed data:

fed.fit <- arima(prodn, order=c(1,0,2), seasonal=list(order=c(0,1,1), period=12))
acf(fed.fit$res, lag.max=120)

pacf(fed.fit$res, lag.max=120)

Again, not much improvement here.

May 16, 2016
Today we examined three data sets and attempted model fitting. The following are the data sets used, and the resulting model provided by the professor:

  1. wine (from itsmr library)
    Model: \(ARIMA(1,0,1) \times (0,1,1)_{12}\)
  2. airline (AirPassengers)
    Model: \(\log ARIMA(0,1,1) \times (0,1,1)_{12}\)
  3. gnp (from astsa) Model: \(ARIMA(2,2,1)\) or \(\log ARIMA(1,1,0)\).
    Note that if the two models are equal, \(\log ARIMA(1,1,0)\) has fewer parameters, so it is considered the superior model.

In the forecast library, there is the sarima.for() function. For gnp:

# the first number, 5, indicates the number of forecasts
# the next 6 numbers correspond to the sarima model - 3 are arima, next is seasonal
#   In this case, no seasonal, so all zeroes
# number of points in the season - here, it's quarterly data
sarima.for(log(gnp), 5, 1, 1, 0, 0, 0, 0, 4) 

## $pred
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2002                            9.165886
## 2003 9.174510 9.182946 9.191316 9.199664
## 
## $se
##             Qtr1        Qtr2        Qtr3        Qtr4
## 2002                                     0.009502408
## 2003 0.015938787 0.021173624 0.025569341 0.029380482

Note that this output is of the transformed log data.